column vector
LightRNN: Memory and Computation-Efficient Recurrent Neural Networks
Recurrent neural networks (RNNs) have achieved state-of-the-art performances in many natural language processing tasks, such as language modeling and machine translation. However, when the vocabulary is large, the RNN model will become very big (e.g., possibly beyond the memory capacity of a GPU device) and its training will become very inefficient. In this work, we propose a novel technique to tackle this challenge. The key idea is to use 2-Component (2C) shared embedding for word representations. We allocate every word in the vocabulary into a table, each row of which is associated with a vector, and each column associated with another vector.
- Europe > France (0.04)
- Asia > Middle East > Jordan (0.04)
LightRNN: Memory and Computation-Efficient Recurrent Neural Networks
Recurrent neural networks (RNNs) have achieved state-of-the-art performances in many natural language processing tasks, such as language modeling and machine translation. However, when the vocabulary is large, the RNN model will become very big (e.g., possibly beyond the memory capacity of a GPU device) and its training will become very inefficient. In this work, we propose a novel technique to tackle this challenge. The key idea is to use 2-Component (2C) shared embedding for word representations. We allocate every word in the vocabulary into a table, each row of which is associated with a vector, and each column associated with another vector.
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Imaging with super-resolution in changing random media
Christie, Alexander, Leibovich, Matan, Moscoso, Miguel, Novikov, Alexei, Papanicolaou, George, Tsogka, Chrysoula
High-resolution imaging from array data in unknown inhomogeneous ambient media requires estimating both the medium properties and the object characteristics. For diverse measurements collected from different sources in different, changing media, we introduce in this paper an algorithm that recovers the ambient media properties needed for high-resolution imaging as well as the source locations and strengths that constitute the imaging target. This algorithm extends and improves upon our previous work on imaging through random media using array data. Previously, we addressed imaging through a single unknown random medium, either weakly scattering [ 1 ] or strongly scattering [ 2 ].
- North America > United States (0.46)
- Europe (0.46)
- Workflow (0.68)
- Research Report (0.50)
- Europe > France (0.04)
- Asia > Middle East > Jordan (0.04)
comments. Reviewer # 1 wants to see an algorithm that works when b
We thank all the reviewers for their time and valuable comments. "Provide an algorithm to output a distribution that's close to the target, even if b has negative components." We will mention this in the paper. This is an interesting direction for future research. "What happens when we increase the number of layers?"
DHO$_2$: Accelerating Distributed Hybrid Order Optimization via Model Parallelism and ADMM
Gu, Shunxian, You, Chaoqun, Ren, Bangbang, Luo, Lailong, Xia, Junxu, Guo, Deke
Scaling deep neural network (DNN) training to more devices can reduce time-to-solution. However, it is impractical for users with limited computing resources. FOSI, as a hybrid order optimizer, converges faster than conventional optimizers by taking advantage of both gradient information and curvature information when updating the DNN model. Therefore, it provides a new chance for accelerating DNN training in the resource-constrained setting. In this paper, we explore its distributed design, namely DHO$_2$, including distributed calculation of curvature information and model update with partial curvature information to accelerate DNN training with a low memory burden. To further reduce the training time, we design a novel strategy to parallelize the calculation of curvature information and the model update on different devices. Experimentally, our distributed design can achieve an approximate linear reduction of memory burden on each device with the increase of the device number. Meanwhile, it achieves $1.4\times\sim2.1\times$ speedup in the total training time compared with other distributed designs based on conventional first- and second-order optimizers.